Glassy transition in a disordered model for the RNA secondary structure 
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We numerically study a disordered model for the RNA secondary structure and we find that it 
undergoes a phase transition, with a breaking of the replica symmetry in the low temperature region 
(like in spin glasses). Our results are based on the exact evaluation of the partition function. 
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The folded structure of biopolymers, like RNA and 
proteins, is crucial for understanding the biological func- 
tionality of these molecules |IJ] and its characterization 
still remains a challenging problem in statistical mechan- 
ics and theoretical biology 0. The folding problem 
usually consists in understanding if and how a partic- 
ular biomolecule (maybe one selected by evolution and 
present now in nature) folds into its native conformation. 
In this Letter we are interested in the characterization of 
the most generic (i.e. random) RNA molecules. Even 
if real RNAs are not completely random, they present a 
very large variability in their sequences and no strong cor- 
relations in their bases. The interest in studying the lim- 
iting and somehow unphysical case of really random se- 
quences arises in order to answer the following questions. 
Is the folding transition, that forces real biomolccules 
into their functional shapes, characteristic of those se- 
quences selected by the evolution? Do random sequences 
show some phase transition too? We answer affirmatively 
to both questions, showing that the transition depends 
more on the geometrical constraints and on the interac- 
tion energies spread rather than on the specific sequence. 
However in the random case the transition is of a glassy 
type and the low-temperature phase is not dominated by 
a single native state. Our results may be very useful in 
order to understand better what could happen in a pre- 
biotic world mainly made of random RNA sequences || . 
Such a transition (partially found only in a very sim- 
plified model of proteins || ) was suggested in previous 
studies of the RNA folding @. 

In this Letter we first study the thermodynamical 
properties of random RNAs, finding some hints for the 
existence of a glassy transition. The clear evidence for 
such a transition is shown in the last part of the paper and 
has been obtained thanks to the typical tools of disorder 
systems statistical mechanics: spin glass susceptibility 
and a related parameter (see Fig. |J). The connection 
with complex systems is well expected: the model has 
both disorder and frustration. 

Generally speaking a classification among biopolymers 
includes a hierarchy of structures and in principle a com- 
plete description must include all these levels. RNA from 



this point of view is supposed to be simpler than DNA 
or proteins since its secondary structure seems to cap- 
ture the essential features of the thermodynamics of the 
molecule. RNA molecules are linear chains consisting of 
a sequence of four different bases: adenine (A), cytosine 
(C), guanine (G) and uracil (U). The four bases are re- 
lated by complementarity relations: G — G and A — U 
form stable base pairs with the formation of hydrogen 
bonds and are also known as Watson- Crick base pairs. 

The secondary structure of RNA is the set of base pairs 
that occur in its three-dimensional structure. Let us de- 
fine a sequence as 1Z = {n, r-2, ... , r n }, being the i th 
base and G {^4, C, G, U}. A secondary structure on 1Z 
is now defined as a set S of pairs (with the con- 
vention that 1 < i < j < n) according to the following 
rules: 

a ) j ~ i > 4 : this restriction permits flexibility of the 
chain in its three-dimensional arrangement. 

b) Two different base pairs € S if and only 
if (assuming with no loss of generality that i < i'): 

i < j < i' < j' : the pair precedes 

i < i' < j' < j : the pair (i, j) includes 

Condition b) avoids the formation of pseudo-knots on 
the structure and the resulting structure can be drawn on 
a plane. In real RNA structures it is known that pseudo- 
knots occur but are rare and they can be excluded as a 
first approximation 

The energy of a structure is simply defined as H[S] = 
S(j j)£S e ( r *' r j)- Other phenomenological parameters 
(including stacking energies and loop penalties) could be 
considered in order to take into account the whole com- 
plexity of the energy function . 

In our approach we assume a drastic approximation 
to the original model in order to improve its tractability 
both from numerical and analytical point of view. As 
a first step we consider sequences of only two symbols 
(A, B), that appear with equal probabilities, and we as- 
sume that only two kind of base pairs occurs: A — A and 
B—B pairs with energy —1 (in arbitrary units); A—B and 
B — A pairs with energy —2. It is reasonable to assume 
that such a reduction of symbols will not affect the ther- 
modynamical class of criticality of the model (this claim 
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is supported by numerical results we have obtained with a 
4-letter code and Watson-Crick base pairs). We did not 
remove the constraint which forbids the links on short 
distances, but we simplify it to: j — i > 2. We think that 
this topological constraint must be kept in order to not 
drastically change the entropy of the model and then its 
thermodynamical behavior. In this model disorder (en- 
coded in the sequence 1Z) and frustration (induced by the 
planarity condition on S) are clearly distinct. We hope 
this could make the model analytically more manageable. 

The model can be formally considered as unidimen- 
sional and with long range interactions: the disorder giv- 
ing rise to different interactions strengths, all with the 
same sign (here the disorder does not induce frustration) , 
while the planarity condition making the long-distance 
links unlikely. We have numerically estimated that the 
probability of having a link between two bases a distance 
r apart goes down roughly like r~ 3 / 2 . 

The planar structure of the configurations and the sim- 
ple energy function chosen allow to write down |6) a re- 
cursion equation for the partition function of the subse- 
quence contained inside the base interval 



induced by the presence of a nearby transition, we obtain 
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with Zij — Zi^i = 1 Vi. Such a recursion relation is 
particularly effective since the time needed for the com- 
putation of Zi^l scales as 0(L 3 ). With a slight modifica- 
tion of the algorithm it is also possible to include similar 
recursions for the internal energy U = (H[S]) and its 
second moment = (H 2 [S}), where (■) is the usual 
average over the Gibbs-Boltzmann distribution. At this 
level all the observables actually depend on the sequence 
over which they have been calculated and, if we want to 
gain information on the class of universality of the model, 
we have to average them over all the random realizations 
of the sequence. 

In Fig. |l| we show the specific heat (averaged over the 
disorder) for sizes ranging from L = 128 to L = 1024. 
We note a very slow increasing of the peak height with 
the size, which seems not to diverge. There is no hint 
for a finite jump in C(T). This could be compared with 
the result by Bundschuh and Hwa Q who found a finite 
jump in the specific heat (note however that their model 
has an unique ground state, which dominates the frozen 
phase) . It is important to point out that in the tempera- 
ture region T ~ 0.15 — 0.2 the curves slightly cross them- 
selves and as a consequence the decrease of C (T) becomes 
steeper for larger sizes. One of the main effects of the dis- 
order is that the location on the temperature axis of the 
critical region becomes sample-dependent. A measure of 
the critical region width can be achieved from the sample 
to sample fluctuation of the temperature where the spe- 
cific heat has a peak (AT P ). We find that AT P oc . 
with u> = 0.26. If we assume that these fluctuations are 
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FIG. 1. The specific heat (and its second derivative in the 
lower inset) as a function of the temperature for different sizes. 
Upper inset: zero-temperature entropy versus size and the 
best power law fit. 

Since the model is unidimensional, a = 2—dv = — 1.9(1) 
and then the second derivative of the specific heat with 
respect to the temperature should display a very slow di- 
vergence or a finite jump. In fact, in the lower inset of 
Fig. [I], can be seen that the argument is fully supported 
by the data, which show the typical finite size behavior 
of a discontinuity. The clear crossing point of the data 
around T ~ 0.2 is supposed to be a signature for non- 
analyticities in thermodynamical potential. We note that 
such a point is located well below the peak temperature. 
This is a common feature in many disordered systems 
(e.g. spin glasses). Near this temperature also the en- 
tropy of the model has a crossing point, which signals a 
rapid shrink of the available phase space. 

Moreover the model has a finite zero-temperature en- 
tropy (see upper inset in Fig. [l]). The zero-temperature 
results have been obtain via an exact enumeration of 
all the ground states structures (GSS) for any given 
sequence. The number of GSS (i.e. the degeneracy) 
strongly depends on the sequence: for example, studying 
thousands of different sequences with length L — 256, we 
have found sequences with degeneracies ranging from 1 
to O(10 7 ). In the upper inset of Fig. [I] we show the zero- 
temperature entropy defined as S(T = 0) = \og(Af)/L, 
where Af is the GSS degeneracy and L is the sequence 
length, as a function of L. The line is the power law 
extrapolation, which tends to S(T = 0) = 0.0255(8) §. 

Since the model turns out to be highly degenerate in 
the low-temperature phase, the natural question is how 
these GSS are organized. It is quite obvious that a very 
different physical behavior may appear in a model whose 
GSS are all very similar (like an ordered or "ferromag- 
netic" behavior) compared to a model whose GSS are 
sparse over the whole configurational space. A more 
quantitative analysis can be achieved introducing the no- 
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tion of distance between structures and a classification 
based on these distances. To quantify the relative dis- 
tance between two structures, we have used the overlap, 
which is defined as 



L ^ 



(2) 



where the variable I 



(S) 



takes value 1 if sites i and j 



are connected in the S (6>') sequence and otherwise. By 
definition the overlap takes values in the interval [0, 1]. 
For any given disorder realization (i.e. sequence) 1Z we 
can define the zero-temperature probability distribution 
function (pdf) of the overlaps as 



Pn(q)= J2 S(q-q[S,S'}) 



(3) 



s,S'er T . 



being T-ji the GSS set. This definition can be easily gen- 
eralized to every temperature summing over all the struc- 
tures and weighting each term with the Gibbs-Boltzmann 
factor of S and S'. The usual classification of disordered 
systems ]To| ] is based upon the average pdf of the over- 
laps, the so-called P(q) = [Pji(q)], the average being 
taken over the disorder distribution function. 

We have calculated the P{q) at different temperatures, 
T € [0, 0.4]. While at T — we summed over the whole 
sets r-R., at finite temperatures we performed a Monte 
Carlo sampling of the structures in the spirit of Higgs || . 
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FIG. 2. The P{q) for different temperatures. Insets: the 
size dependence of P(q) in the high (left) and low (right) 
temperature phases. 

In Fig. H the averaged P(q) are shown. The first strik- 
ing evidence is that, decreasing the temperature, the 
shape of the P(q) changes abruptly from a narrow peak in 
the low-q region to a broader one which extends over al- 
most the whole allowed support. In the insets we present 
the size dependence of P(q) for the highest and lowest 
temperature considered. For the T = 0.4 case, we are 
highly confident that the thermodynamical limit would 
be a delta function (the width of the distribution goes to 
zero as Aq oc L~ x / 2 ). For the T = case, the asymptotic 



shape is much more difficult to be extrapolated, since the 
width of the P(q) scales with a small power of L (as in §) 
and, eventually, we can not exclude that it goes to a finite 
value, implying a breaking of the replica symmetry. 

While in Fig. | the averaged P(q) gives us information 
about the typical pdf of the overlaps, we can get some 
hints about the origin of the P(q) broadness in the low- 
temperature phase analyzing directly the P-n{q) for each 
sequence. If all the GSS of a given sample are very similar 
its Piz(q) will be non-zero only in a narrow g-range not 
too far from the upper bound q = 1. On the other hand, 
if the GSS are very heterogeneous their mutual overlaps 
will cover a large q-range. 

The great majority of the sequences shows a very 
broad Pn(q), signaling a strong heterogeneity in the 
GSS. Moreover the shape of the pdf completely changes 
from sequence to sequence (this property is called non- 
self-averageness in spin glass jargon Nevertheless 
some patterns can be easily recognized: while single peak 
shapes are mostly associated with low-degeneracy se- 
quences, highly structured ones seem to be not correlated 
to their degeneracy and they are responsible for the P(q) 
broadness. Among the latter the double-peak shape dom- 
inates, especially for the sequences with higher entropy: 
the higher q peak gives information about the typical 
distance between two structures in the same state [ p"T| |, 
while the lower q one can be associated with the rising of 
a backbone that is the set of persistent links com- 
mon to all the GSS (already found in [fL3|). The position 
of this second peak strongly fluctuates from sample to 
sample, giving rise to the long tail in the P(q), like spin 
glass models in external field. 

In order to understand whether a true transition hap- 
pens in this model, we have measured the order param- 
eter introduced in |l4|] 



A = 



lx 2 



KJ 



Ixn] 



[xn] 1 



(4) 



where \ti — L(Aqji) 2 , being Aqiz the width of Pn(q). 
The A parameter measures how much the P-ji (q) changes 
from sample to sample. The crossing point of different 
curves in Fig. ^signals the existence of a low-temperature 
spin glass phase, where the P-n{q) become non-self- 
averaging (analogous results has been obtained with the 
4-letter model) . In this phase the RNA is "folded" , that 
is the number of links is nearly the maximum allowed. 

The critical temperature of this transition seems to be 
located between T = 0.1 and T — 0.15. We have de- 
termined the best estimates for T c and for the critical 
exponent rj requiring the best collapse for the suscepti- 
bility x = [xiz] data, scaled assuming the usual finite-size 
formula X = L 2 ^ f(L^/ u (T - T c )) (see inset of Fig. |). 
We obtain the values T c ~ 0.13 and r\ ~ 1.41. We stress 
that we also tried to collapse the data fixing T c = 0, but 
the result was very poor. 
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FIG. 3. The crossing of the A parameter signals the tran- 
sition to a glassy phase with the replica symmetry broken. 
Inset: scaling plot of xL v ~ 2 versus L~ 1/v (T - T c ). 

The critical temperature seems to be below the one 
we found by the study of thcrmodynamical quantities. 
However given the high value of v, the critical region 
should shrink as L~ x l v and then all the region around 
T = 0.1 — 0.3 is critical as suggested by the wide separa- 
tion of the two peaks in d\C (lower inset of Fig. |l|). 

We have presented strong numerical evidences for a 
phase transition in a random model for the RNA sec- 
ondary structure. It is very important to stress that the 
thcrmodynamical limit is not so interesting for biologi- 
cal RNAs, which are at most thousands bases long. As 
a consequence our sizes are in principle directly compa- 
rable with a large number of biological molecules. Our 
findings about the broadness of the P(q) could suggest 
for the existence of zero-energy fluctuations of the order 
of the volume, which is a well known behavior in spin 
glass and disordered systems. In |l5|], for example, it has 
been found that the matching problem (which is disor- 
dered and frustrated) has low-energy excitations of order 
y/L- These excitations becomes irrelevant in the thermo- 
dynamical limit, but they are a key ingredient in order 
to correctly describe finite systems. In the low tempera- 
ture region of our model (from T = 0.13 down to T = 0) 
X oc L o e and it has strong fluctuations from sample to 
sample. This situation can be described by an effective 
breaking of the replica symmetry with a strength which 
goes to zero as L~ 0A according to the slow shrinking of 
P(q) at T = 0. Incidentally we note that in all this tem- 
perature region the critical exponent of \ is the same, 
as suggested from the scaling plot in the inset of Fig. [|. 
Moreover we have also measured the G cumulants defined 
m || and we have verified that it goes to the value i 
as the temperature goes to zero coherently with a replica 
symmetry breaking scenario. 

In conclusions, we have found a glassy transition in a 
simplified random model for the RNA secondary struc- 
tures. This transition corresponds to the breaking of the 
configurational space in many disconnected regions (er- 
godicity breaking). In terms of random RNA folding, 



this means that below the critical temperature almost ev- 
ery sequence folds (all the low-energy structures are very 
compact), but very often not in a single structure. The 
ergodicity breaking is of primary importance also for the 
folding dynamics, that may become very slow (glassy). 

We have checked that the transition disappears as soon 
as we remove the constraint of not having links on short 
distances (maybe this is a pathology of the 2-letters code) 
or as soon as we set all the interaction energies to the 
same value. These facts suggest us that the glassy tran- 
sition is mainly due to the freezing of some strong links, 
which then force the rest of the interactions, aided by 
the geometrical constraints. A cooperation phenomenon 
between interaction energies heterogeneity and geometri- 
cal constraints has been already observed in DNA mod- 
els@. 

We warmly thank R. Zecchina for many interesting 
discussions and for a careful reading of the manuscript. 
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